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ABSTRACT 


In this research study we investigate the possibility of employing parallel 
computing structures for adaptive identification and control. The identification 
algorithm is based on recursive least-squares with covariance resetting and block 
processing technique. It is shown how a parallel processing structure can be obtained 
by recursively applying Givens Rotation to the data matrix. Due to the regularity of 


its structure the approach presented is particularly attractive for VLSI implementation. 
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1. INTRODUCTION 


A. BACKGROUND 

The techniques for on line identification have reached a great degree of 
sophistication in the recent years. Although many types of model structures could be 
used to describe the dynamical behavior of deterministic systems, the major emphasis is 
still on linear models. The whole problem is to try to determine which model best 
describes the dynamics under. observation, in the neighborhood of the desired operating 
condition. | 

In some cases, insight of the model mav be obtained from the laws of Physics, 
Chemistry, or similar. In most cases however, this might not be feasible due to the 
complexity of the physical factors involved. In these instances it may be possible to 
deduce the values of the parameters by observing the nature of the system’s response 
under appropriate experimental conditions. We shall call this procedure parameter 
estimation. 

The issue of parameter estimation plays an essential role in adaptive filtering, 
prediction, and control. The steps involved in a parameter estimation problem are the 
following: 

1. Select an appropriate class of models. 


2. Determine a “cost, function”, i.e., criteria by which the relative performance of 
different models within the class will be judged: 


(2 


Choose the Peon test signals which, reflect all range of desired operating 
conditions. In particular for engineering applications we taik about signals. 
“sufliciently rich” in frequency, which Span the whole frequency range of 
interest. 


4. Select an appropriate estimation algorithm according to on line or off line 
requirements. In the on line case, where the data are processed sequentiailv to 
vield an estimate in real time, the computational complexity is a fundamental 
factor in the choice of the estimation algorithm. 


On 


Make use of prior Knowledge available on the system. I[t is, alwavs 
advantageous to incorporate asS_much prior knowledge as possible into the 
estimation algorithm, in terms of initial conditions and possible constraints in 
the estimated parameters. 

For the particular case of estimating the parameters of linear models, we search within 
the class of models with a given fixed order for the one which minimizes a prediction 
error criterion. Test inputs are usually given in the form of white noise or sum of 


sinusoids within the frequency range of interest. 
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For what the chcice of the estimation algorithm is concerned, extensive 
simulations show that techniques based on recursive least-squares present a much 
faster convergence rate than other algorithms, such as projection algorithm (Goodwin 
and Sin) [Ref. 1: p. 62]. The.reason beyond this is the fact. that the cost function 
associated with the former includes all past data (and so it has memory) while the 
latter is memoryless. The result is an improved convergence rate of the estimated 
parameters, an improved tracking for systems with time varving parameters, and better 
robustness in the presence of noise [Ref. 2: pp. 1-10] and unmodelled high frequency 
dynamics at the expenses of the need for more computing power. 

The origins of the least-squares method can be traced back to Gauss [Ref. 3: p. 
63]. In its recursive version it has been formulated independently by several authors. 
The original reference seems to be Plakett [Ref. 4: p. 149]. An early treatment of the 
least-squares method applied to dvnamic system identification can be found in Astrom 
[Ref. 5: pp. 123-130] and many related papers dealing with different aspects of the 
least-squares method can be found in the literature. A comprehensive treatment of 
adaptive techniques for identification is given in Ljung [Ref. 3: p. 17-20]. 

The major drawback of recursive least-squares identification is its cost in terms of 
computational complexity, which might make it unsuitable for real time identification 
of a large number of parameters. In fact it requires on line matrix manipulations. 
whose size grows with the dvnamical complexity of the svstem to be estimated. The 
situation becomes even worse when on line identifications techniques are applied to 
multivariable systems, where the number of unknown parameters grow also with the 
number of inputs and outputs. In this respect the limitations due to serial 
computation is evident. From this list of implementational problems, the need for 
adequate computing capabilities is a crucial point for the implementation of on-line 
estimation techniques. 

In this research we analyze an on line recursive least-squares identification 
algorithm which processes sequential “blocks” of data. In particular the parameter 
estimates are determined by the least-squares solution of a redundant number of linear 
equations obtained from the measured data. Existing techniques of solutions for this 
class of equations based on QR decomposition are particularly attractive for parallel 
processing applications. In the next section we introduce the concept of QR 


decomposition and its significance for parallel computation. 


1] 


B. QR DECOMPOSITION 

A numerically attractive method for the least-squares solution of linear equations 
is given by the QR decomposition of a matrix. To illustrate the concep: consider the 
system of linear equations 


AQ=8 (1.1) 


in the unknown 9, with 8 an n-dimensional vector and B m-dimensional, where m>n 
(more equations than unknowns). The equal sign in (1.1) 1s intended to be in a least- 
squares sense. That is to say that given A, B the vector 9 in (1.1) is the one which 
minimizes the square error | A989 — B |. By using the QR decomposition (Ref. 6: pp. 


143-144] we can always decompose a given m Xn matrix A as 


A = Q& 


with Q an mXm orthogonal matrix, Le.., QQ! = I, [ being the identity matrix of 
appropriate dimensions, and & of the form 
R 
R=] ---- 


O 


with R nXn upper triangular. It can be shown that any solution of the equation (1.1) 


in the least squares sense 1s also solution of the system of n equations and n unknowns 


R9= By (Ee 


with B, obtained from Q and B. Also if the matrix A is full rank, the matrix R is 
invertible and the unique solution 9 can be computed from (1.2). 

The important points are the facts that equation (1.2) does not require explicit 
matrix inversion and can be solved by successive substitution, and a large number of 
techniques can be used to arrive to equation (1.2). In particular the Givens Rotation 
without square root [Ref. 7: pp. 215-218] to be presented in the subsequent chapters 1s 


attractive for systolic arrays implementation. 
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C. SYSTOLIC ARRAY 

An efficient way of handling such procedure in QR decomposition is by using 
parallel processing structures, such as systolic arrays or wavefront arrays. The idea of 
a systolic array was developed by Kung and Leiserson. [Ref. 8: pp. 256-281] It consists 
of an array of individual processing cells that are arranged in the form of a regular 
structure. Each cell in the array is provided with local memory of its own and each cell 
is connected only to its nearest neighbors. The array is designed such that wavefronts 
of data are clocked throughout in a highly rhythmic fashion, much like the pumping 
action of a human heart; hence, the name “Svstolic’. These systolic arrays enjoy 
simple and regular communication paths, and almost all processors used in the 
networks are identical. As a result, special purpose hardware based on systolic arrays 
can be built inexpensively using VLSI technology. What is interesting in high- 
performance parallel structures is that it can be implemented directly as a dedicated 
hardware device. 

In this research, the recursive algorithm, based on recursive least-squares with 
covariance resetting, is redesigned in order to make it suitable for implementation on a 
VLSI chip and also modify the systolic arrav in order to reduce computing time and 
complexity. The difference in adaptive control is that the estimator has to operate 
recursively on subsequent blocks of data, so that the estimated parameters converge 
asymptotically to the respective correct values. Asymptotic convergence is guaranteed 
by initializing each estimation with the parameter values from the previous one, and by 
persistency of excitation of the external inputs. 

This research report is divided as follows; Chapter II discusses on line estimation 
and recursive least-squares, while Chapter III analyzes the parallel algorithm using QR 
decomposition and systolic array. In Chapter IV. shows the simulation results of 


parallel algorithm, comparison and conclusion is discussed in Chapter V. 
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Il. RECURSIVE LEAST-SQUARES ALGORITHM 


A. LINEAR MODELS FOR DISCRETE TIME SYSTEMS 

Consider a dynamical system with single input signal u(t) and single output signal 
y(t). In a sampled data framework these are numerical sequences obtained from 
sampling continuous time signals at a frequency W,. Without loss of generality 
assume the sampling interval T to be one time unit, and let the time index be t = 


1,2,3,......- A linear discrete time model of the discrete time system is givenoyenme 


difference equation 


y(t) + ayy(t-1) +... © aM t-n) = Oy U(C]) © 2s ey oe (220 


where v(t) denotes a disturbance term to be specified. We shall use operator notation 


for conveniently writing difference equations. Thus let qo! be the backward shift or 


time delay operator 


ql x(t) = x(t-l) (2.2) 


Then (2.1) can be written as 


A(q7 Dy(t) = B(q7 u(t) + v(t) (2.3) 


where A(q7h) and Bq!) are polynomials in the delay operator: 


A(q?l) =[+ ace as Sse + ane 


I 


: : =D : 
B(q ly b1q Ly bog - aabeag 7” 


The model (2.1) or (2.3) describes the dynamic relations between the input and the 
output signals. In particular the polynomials A.B model the linear part. while the 


disturbance term v models disturbances(at the input and output), nonlinearities, and 


I4 


anything which is not accounted bv the linear model. An alternative way to represent 


the linear model is by defining the parameter vector. 


gl = (Ay yeeees A104 4-10) 


the regression vector of lagged input-output data, 


ol(t) = [ -y(t-1), .., -y(t-n), u(t-1),..., u(t-m) ] (2.4) 


and write (2.3) as 


y(t) = OF ot) + v(t) (2.5) 


Beyond the clear relation between (2.3) and (2.5), the significance of the regression 
model is two folded: 
a) it highlights the linear relationship between the observed signals y(t), p(t) and the 
parameters 8 of the model, and 
b) if we assume v(t) to be a sequence of n uncorrelated random variables (such as 
white noise) the predicted value of y(t) given the measurements can be easily 


expressed as 


y(t) = OF eit) (2.6) 


In this research we are interested in the problem of estimating the parameter 9 from 


measurements of the input,output sequences. 


B. ON LINE PARAMETER ESTIMATION 

It is possible to estimate the values of the model parameters 8 by observing the 
nature of the system’s response under appropriate experimental conditions. The idea 
for the recursive identification problem is to compare the output with that of an 
adjustable model, and update the parameters until the error between the outputs of 
plant and model is minimized. in some sense. The procedure is schematically described 


in Figure 2.1. 


10S. 
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- Let us consider the difference equation model in (2.5). We desire to estimate the 
parameter vector 9 from a set of measured data y(t), @(t); t = 1, 2. ..., N. In a least- 
squares framework we choose this estimate by best fitting the linear model to the 


aVathuble data. That ts, we define a criterion function 


l y 
V(8) = —* af y(t) — OF p(t) 17 (2.7) 


to be minimized with respect to 0. 

The inclusion of the coefficient @, in the criterion (2.7) allows us to give different 
weights to diflerent observations. In applications, most often d, is either identically 
equal to | or expresses a forgetting factor of the form A‘ in order to give a higher 


Wetaht to more recent data, =| Rela 3) pee 


A 
If we denote by y(t|0) = 0! ot as the prediction of v(t), given the parameter 
vector 0, the criterion (2.7) can be seen as an attempt to choose a model that produces 
the best prediction for the output signal in the least-squares sense. The cost function 


Vx can be minimized analytically with respect to 6, to obtain the estimate 
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Q(X) = iat aor (ny! Yao (2.8) 


which is defined, provided the inverse of the matrix involved exists. To obtain a 


recursive solution (2.8) denote 


Rit) = Yg (ok) T()) 


and, from (2.8), write 


PS ce (k)y(k) = R(t~ 1) a(t— 1). 


From the definition of R(t) it follows that 


R(t— 1) = R(t) — a, o(t@ *(t) 


hence standard algebraic manipulation lead to the recursion 


a(t) 


R(t) S PS) ae Oy) 
(2.9) 


A(t-1) + R(jecda, [¥(t) — OMe Hert) | 


The algorithm in (2.5) is not well suited for computation as it stands, since a matrix 


has to be inverted in each time step. It 1s more convenient to introduce the matrix 
P(t) = Rol(t) 


and update P(t) directly on the basis of the matrix inversion lemma [Ref. 3: p. 19] This 


leads to the recursion 


Pt Vea es 
P(t) = P(t-l) — ee ee OE (2.10) 


La, + o!(t)P(t— Lit) 


The advantage using of (2.10) is that the inversion of a square matrix of size equal to 
dim 0 is replaced by the scalar division. Thus the least-squares estimate 6(t) defined bv 


(2.8) can be recursively calculated as 
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A A —1\ A 

(ty = A(t 1) + —— (y(t) T(t 1o(t) | 
i 
L'a, + @(t)P(t-1)@(t) 2.11) 
= I = 

ioe Hee P(t= Dele ()P(t= 1) 


ia, + ol (yP(t— Heit) 


The problem with this algorithm is that the matrix R(t) might not be invertible 
and therefore P(t) might not exist. This is the case when t S dim Qq, since in this case 


we can write 


R(t) = [ @(1), .--, O(t) J : -oo! 
l(t) 
when o! has more columns than rows(as in the case of t S dim @) we can always 


find a non zero vector X of appropriate dimension such that 


oO 
which shows R(t) to be a singular matrix whenever t S dim @. This problem of 
existance is addressed in the next section, where the recursive least-squares algorithm 1s 


modified to accommodate initial conditions and block processing techniques. 


C. RECURSIVE LEAST-SQUARES ALGORITHM 


The algorithm (2.11) can still be applied if we modify the cost function (2.7) as 


I L 


2 








Vi) = Tv) oT (oo? + —0- HOPE "@- BED 


tJ 


C 


where P(O) and 0(0) are the initial conditions of P and @ respectively. Basically, the 
cost in (2.12) represents the sum of squares of errors e(t) = v(t) — o!(ty0, Which 1s 
the difference between the actual observation y(t) and the value predicted by the model 
with parameter vector 8. The second term on the right hand side of (2.12) has been 
included to account for the initial parameter estimate. Relevant observations for the 


recursive least-squares are given in the remainder of the section. 


Observation 1: The recursive algorithm requires initial values for terms 9 and P 
as 6(0) and P(0) in order to start the recursion at t=0. The relative importance of the 
initial values on the recursive estimate decays with time, since the leftmost sum in 
(2.12) becomes preponderant with N encreasing. At the same time the entries of the 
matrix P become smail with t - © for anv initial conditions P(0). 

The initial value P(O) of the matrix P appears in the cost function (2.12) as the 
weight given to the initial estimate 9(0), and it has to be positive definite. For 
convenience it is often chosen as P(0) = Cel, where C is some positive constant. In 
particular P(0) expresses the “confidence” we have on the initial conditions 0(0), and it 
is set to have large entries if we have poor confidence on the initial conditions 0(0). 
This can be easily seen in the cost function (2.12) where a very large P(0) would give a 
very small weight to the term 8 — 8(0) associated with the initial conditions. 

Observation 2: The least-squares algorithm generally has much faster 
convergence than other algorithms, such as the projection algorithm, and it can be 
used essentially unaltered with noisy signals [Ref. |: P. 62]. We now discuss a number 
of variants of the least-squares algorithm. A problem of the recursive least-squares 
algorithms defined in (2.11) 1s that in general the matrix P(t) tends to zero ast ~ ©. 
This can be seen from the definition of P(t) in (2.11) and the matrix inversion lemma 


which yields 
p(t)! = p(r— 1)! + @(tio h(n) 


in the scalar case (tio! (t) is a non-negative quantity, and therefore the sequence 
P(t)! is nondecreasing and might tend to infinity (i.e..P(t) might go to zero). The 
same result holds in the non-scalar case shown in (Goodwin and Sin) [Ref. |: pp. 
34-58]. The consequence of P(t) — 0 is that the algorithm becomes less sensitive as t 
— %. There are several wavs to cope with this problem. We can either discount past 
data with a forgetting factor, or periodically reset the covariance matrix to nonzero 
constant values. The effect of these two modifications is the ability to apply the 
estimation algorithm to systems with time varying dynamics. 

Although the algorithm presented and related analysis assumes a SISO plant for 
simplicity, they can be extended directly to the multivariable case (MIMO). 
Furthermore the plant is assumed to be causal, having unknown parameters and 
known order. The problem of order determination and validation involves criteria 
which go beyond the scope of this research. Therefore it will be always assumed that 


the order of the chosen model is a parameter given to the designer. © 
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D. RECURSIVE LEAST-SQUARES WITH COVARIANCE RESETTING 

The approach in this research 1s based on recursive least-squares with periodic 
covariance resetting, and block processing. By block processing approach we mean 
that the time set Z. 1s divided into segments I, of fixed length N as 


2 Se 
pretence 


where 
ie ieee (kelN St = KN] 


and the parameter vector @(the coefficients of A and B) in (2.3) are kept constant 
Within each interval I, . 
The serial implementation call for the estimation @(t) by the recursive least-squares 


shown in equation (2.11). 


Q(t) = @(t—-1) + eae ry(t) - 8 1(t— Hort) | 
: ) 
1+ @l(t)P(t- Lot) yy 
= a ae 
P(t) = Pltel) — Pit= DOMO- (hr) 


L + o!(t)P(t— Let) 


and since the adaptive gains are updated at t = KN only (once for every block), only 
the estimates 81; are considered. 

There are several reasons which justify the block processing approach. For 
example one application of on line estimation is found in adaptive control. 
As shown in Figure 2.2, in this class of problems, the output of the on line identifier ts 
used to set the values of the parameters of a linear compensator. In an adaptive 
control context it has been shown that an external command u(t) in (2.1) sufficiently 
rich in frequency, together with blocks I, of sufficient length N provide a guarantee for 


a consistent estimation as 


6(t) > 9° (Q° representing actual parameters). 
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Estimator 


Compensator 





Figure 2.2 Adaptive Control. 


As mentioned in the previous section the parameter estimation we consider is 
such that the covariance matrix P, — 0 and the estimator in (2.13) looses sensitivity to 
variation of 8 ast + © {Ref. 1: p. 65]. Although the covariance matrix P, can be 
reset at anv time, for convenience of analysis we assume that it is reset at the beginning 


of each block as 


Phi) = O97! (2.14) 


with I the tdentity matrix and o, an arbitrary positive constant. The result discussed 


above are formalized in the following theorem, proved in [Ref. 1: pp. 60-61). 


Theorem 
Let a SISO discrete time linear system be modeled by the difference equation 
(2.1) with v(t) = 0. Also let 


1. n= degree A(q7!)= degree B(q7!). 

2. A(0) = 1, B(O) = 0 (i.e. the system is causal). 

3. let @ = [A, Bt A, B being the coefficients of A(q7!) and B(q7)). 
4. 


the coefficients 2c Aq), Bia) in (2.3) be held constant within the time 
intervals Ly, Wit {f ae 
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5. the external command sequence u(t) have ai ieast 4n different sinuso:dal 
coniponents. 


6. the estimated sequence @(t) be as in (2.13). 
Then under these conditions, lin ®, = @ exponentially. 
t+ 00 


The extension to bounded disturbances v(t) 0 can be shown under the condition that 


|v(t)| is sufficiently small compared with the input and output signals. 
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Il. APPROACH TO PARALLEL ALGORITHM 


A. IMPLEMENTATION OF THE PARALLEL ALGORITHM _ 

The real time identification algorithm with recursive least-squares, covariance 
resetting and block processing in equation (2.13) and (2.14) is redesigned for parallel 
implementation. Bvy definition, using the minimization of quadratic cost function in 


(2.12), the estimate @(t) at time t minimizes the quadratic cost function 


Vi K(O,8n ) = 


Sa cd o(j) 10 I + (0- Ox Pen yt 9-6. ) 


with t € en (re NE t<(k+ fyNjeand Pent = a I. The fact that the 


covariance matrix P, is reset to initial conditions at each t=KkN-1l, is equivalent to 
least-squares estimation in block I, 4 , with initial conditions Or. Fort = (k+1)N-1 


the cost function V in (3.1) can be written as 


V,(8) = W,(0)? W,(8) (3.2) 


where 


W,(6)! = [ er (8), ae fk + 1)N-1(9) 7 6 (8 : 6.x)" | 
e(8) = y(t) — e(t)'@ 


By the above definitions we can write W,, as 


Ls 


y(kK+1N-1) | a 





W,.(9) = J Q (3.3) 
YBN) ~ FONT 
0. 
; 08x O | 
ae By 7 A,@ 


with B, e REFN) *1 ge RI*! A, e RIGT A) *4 where d is dimension of @. 


The algorithm we analyze computes 9,.); by minimizing the cost function in (3.1) 
based on a QR decomposition of the matrix A, in (3.3). The estimated sequence On\ 
so obtained is therefore identical to the one obtained by recursive least-squares with 


covariance resetting in (2.13),(2.14) at t = KN. 


B. SOLUTION OF THE LEAST-SQUARES PROBLEM USING QR 
DECOMPOSITION : j 


A numerically attractive method of solving the least-squares error problem is the 
QR decomposition of a given matrix. Bv applving the QR decomposition method 
described in the introduction, at each time block k define Q, and A, as the QR 


decomposition factors of A, as 


OA ae om Gay 


or equivalently 


Ay = Oy” & (3.5) 


Waluel (Oa R(d+N) Xd an orthogonal matrix so that 
On ad Q, 7! 


and Ry & R(d+N) xd partitioned as 
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we |e (3.6) 


Ry € pd xd upper triangular matrix, and O the null matrix. Applying the same 
transformation Q; to By in (3.3) partition the result as 


Ba 








Piths,; € po~ i. By definition of V, in (3.2) and orthogonality of Q, we can write 
V,(0) = W,1(0)Q; Q, W,(8) 


eno by in (3.3) - (3.7), 


=aseea uw 2. oe 











where B, 5 is independent of 8. Therefore 
V,(0) = | RO — Buy I? + | Bgl? 


In the above equation if the matrix R;, is nonsingular then the cost function V, is 


minimized by 


a =) | 
8+ LN = Ry By (3.8) 


Notice that equation (3.8) does not require anv explicit matrix inversion since Ry 
is in upper triangular form. Therefore by the QR decomposition introduced above, we 
can compute OK + LYN using two processors P1,P2 in cascade as in Figure 3.1. 

In Figure 3.1, processor Pl computes R,, By,, while processor P2 computes § from 
(3.8). In the next section we see how use parallel computational structure to compute 


Ry, By, and solve equation (3.8). 


a> 





Figure 3.1 Two Step Computation for 0... 


C. PARALLEL SOLUTION OF THE LEAST-SQUARES ALGORITHM 

This section is devoted to the solution of (3.8) using a parallel structure, and we 
introduce an algorithm suitable to the computation of the QR factorization of any 
given matrix of dimension nXm (n>m). Although several techniques exist in the 
literature, the Givens Rotation ts prefered in paralle} computing structures, since it 
successively manipulates two adjacent rows of the matrix at a time. For this purpose 
we define the Givens operator Q(p.q), which performs a rotation of 2 X | dimensional 
subvectors of the matrix A in order to annihilate the element Aap’ In the applications 
to parallel processing, the rotation 1s determined based on adjacent elements of the first 
column, then it propagates to the successive right columns of the matrix. The Givens 
rotation provides matrix triangularization by aflecting two rows at a time and the 
Operations are limited to neighboring data in the matrix. Basis of the Givens rotation 


is the matrix 


I, 0 0 
OCD a) a Ot 35a) (3.9) 
Caen 5 
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associated to each pair of indexes p,q & [1,d+1j, with I, and I, identity matrices of 
: . -. ax 
dimensions (q-2) X(q-2) and (d+ 1-q) x(d+1-q) respectiveiy, and r € R7 2 of the 


form 


c(p.q) (p,q) (3.10) 


NP) = -s(p.q)  ¢(p,q) 


Application of the transformation Q(p,q) to any matrix A of appropriate dimensions, 


yields 
Meeeex |X 
ean = | x i x (3.11) 
Kee Xx =—q 
Rae ey OX 
t 
P 
mroviaed the terms c(p,q), s(p.q) in (3.10) are such that 
— = 
c(p,q) Sool s( p,q) tap I (3712) 
(Pq) Agn — S(P»4) Ag.1 4 = 9 
which yields the unique solution of rotation parameters 
ag. 
aa + age 
q-l,p qp (3.13) 
a 
i 
4q-Lp * agp 


the notation x 1n (3.11) just indicates other terms of the matrix. 
Observation: Notice that at each time block k, the matrix A, as in (3.3) has full 


rank provided the parameter o, 1s nonzero. Therefore the least-squares solution of 


O 
(3.3) is always unique for each block k, and as a consequence the diagonal elements of 


R, are always nonzero. 


ee 


An example of appication ot the Givens Rotation in a three dimensional case is 
the following 


Sg oes 7 7) Vos 
ee >on 411 “12 4B 
0 0 133 = Q(3,4)Q(2,3)Q(1,2) 0 ay a93 (3.14) 
0 0 0) 0 0 A333 


Each transformation Q in (3.14) is performed in two steps: At first, we compute c and 
s defined in (3.13) and then we perform the linear combination of rows q and q-l of 
the A matrix in order to obtain the upper triangular matrix R, in (3.14). By using this 
idea, We can implement the solution of (3.8) in a parallel structure. For each time 


block I, define the initial matrices 


(3.15) 


0. being the estimate from the previous block of data I,_). Then Ry in (3.8) can be 


determined as Ry = Ro where Ri is recursively computed as 





Tren j 
Tey = J ee ,j = 0,1,....N-1 
Ri kK 10 
a . (3.16) 
y(KN +)) By y | 
aoeae aie = Q ae | 
2] 2k? 


These operations are performed by the network of cell processors shown in 
Figure 3.2. The processors work simultaneously at the same clock pulse in each cell 
and their operations will be described by the transition and output function at the next 
section. In Figure 3.2, the regression vectors @(t) are at the top of the array, and the 
processors operate with the given data simultaneously. The details of the 


implementation are given in the next section. 
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Figure 3.2. Systolic Array. 


D. SYSTOLIC ARRAY IMPLEMENTATION FOR PARALLEL STRUCTURES 
1. General Description of Systolic Arrays 

[In this section we show how the least-squares algorithm presented above can 
be implemented in a systolic array structure. In particular we aim at a computer 
Structure which accepts as input the sequences of regression vectors @(n) and signal 
y(n) and outputs least-squares estimates of the parameter 9 by block processing. The 
two processors described above (P1, P2. in Fig. 3.1) can be implemented as arrays of 
cells as shown in Figure 3.3, where the particular case of d=4 is presented. In the 
block processing we had mentioned in Chapter II, the processor Pl(triangular section) 
Operates during the given block time, while P2(linear section) operates only starting at 
the “reset” signal at the end of each block, all data in the triangular section flow to the 
linear section. 

The entire systolic array is controlled by a single clock signal. Each array 


consists of two types of processing cells; internal cells (represented by squares) and 


es 
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Figure 3.3 Systolic Array [Implementation for Recursive Least-Squares Algorithm. 


boundary cell (represented by circles). The specific arithmetic function of the cells will 
be defined later. The triangular systolic array section implements the Givens Rotation 
part of the recursive least-squares algorithm, Whereas the linear svstolic array section 
computes the least-squares Weight veetor at the end of the entire recursion without anv 


matrix inversion. 
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2. Triangular Systolic Array 

The triangular systolic arrav performs the Givens Rotation described above. 
In particular the boundary cells compute the rotation parameters c and s as in 
Equation (3.12), while the internal cells compute the linear combination of adjacent 
rows. The efficiency of the systolic array comes from the fact that each cell operates 
continuously on the data. In fact once a term in the matrix is updated and the results 
are passed to the cells on the right and below, new data can be immediately fed and 
computed. The mutual relation among the computing stages of different cells at the 
Same time t is shown in Figure 3.3 where the label on each cell (1,)) indicates the order 


in which the data enter the array. : 





Figure 3.4 Computing Sequence of Triangular Array. 


The operation of the triangular svstolic array are summarized in Figure 3.5 which 
shows the boundary and internal cells. Basically, the internal cells perform only 
additions and multiplications, the boundary cells are considerably more complex in the 


sense that they compute three multiplications, one addition and one division at each 
Sycle. 
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Figure 3.5 Cell Definition for a Triangular Systolic for Performing Orthogonal Triangularizatior 


Each cell of the triangular systolic array section stores a particular element of the 
upper triangular matrix R(n) in (3.6) and it is mitialized to zero for internal cells and 
On for the boundary cells as mentioned in (3.15). The function of each row of 
processing cells in the triangular systolic array section is to combine one row of the 
stored triangular matrix with a vector of data received from the above cells in such a 


way that the leading element of the received data vector is annihilated. The data 


a2 


vector so obtained is then passed downward on to the next row of cells. The boundary 
cells in each row of the section computes the pertinent rotation parameters and then 
passes them to the right on the next clock cycle, as shown in Figure 3.5. The internal 
cells subsequently apply the same rotation parameter to all other elements in the 
received data vector. 

Since a delav of one clock cycle per cell is incurred in passing the rotation 
parameters to the right along a row, it is necessary that the input data vectors enter the 
triangular systolic array in a skewed order, as illustrated in Figure 3.3. This 
arrangement of the input data ensures that each column vector @(n) of the data matrix 
propagates through the array, it interacts with the previously stored triangular matrix 
R(n-1) and undergoes the sequence of Givens Rotation Q(n). 

At the same time as the orthogonal triangularization process being performed 
by the triangular svstolic arrav section. the column vector B(n) is computed by the 
rightmost cofumn of internal cells. In effect, this computation is-made bv treating the 
desired response vector Y({n) as an additional column that is appended to the data 
matrix @(n) at its right end. When the entire orthogonal triangularization process is 
complesed, each particular row of the upper triangular d < d matrix R(k) and the 
associated d X 1 vector 8(k) along the corresponding wavetront is clocked out for 
subsequent processing by the linear systolic array section at the last clock time of the 
block processing period I. 

3. Linear Systolic Array 

The linear svstolic array section consists of one boundarv cell and (d-1) 

internal cells that perform the arithmetic function defined in Figure 3.6 in accordance 


with equation (3.17). 


Zi\(nt1) = Zin) + ryi(k) O;(k) (3.17) 
es) — 8k) — Z(n) 


where 1 is from 1 to d-1, k is current NO. of block, n denotes the clock cvcle in the 
linear section, Z(n) are intermediate variables, and rii(k) are elements of upper 
triangular matrix R(k). B(k) are element of the vector B(k) and 0.(k) are element of 
least-squares weight vector 9(k). This section of the processor computes 9(k) by using 
the method of backward substitution [Ref. 8: pp. 272-274). 
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Figure 3.6 Cells for Linear Systolic arrav. 


The boundary cell performs subtraction and division, whereas the internal cell 
perform additions and multiplications. The elements of the least-squares weight vector 
appear at the output of the boundary cell at different clock cycles, with B:(k) leaving 
this cell first, followed by B,_,(k), and so on right Ce _ In elfect, tlie eclementuam 
the weight vector 0-(k) are read out backward, t.e., 1 = d-l,..., 2, 1. 
4. Solving The Estimated Parameters 

Once the triangular matrix R(k) and the corresponding vector B(k) are 
computed, next task 1s to transfer the data to the second processor(the linear array) 
and solve for the estimated parameters. This operation is performed at the end of each 
time block(t = kN). [In order to shift the entries of the matrix R(k) and vector B(k) of 
the triangular array it is easy to see from the ceil processors in Figure 3.5, that values 
c=0, s=-t of the rotation parameters cause the data in the cells to shift to the cell 
below. This operation can be successively propagated to all cells on the right. This 
operation ts triggered by the “reset” command at the end of each time block. The reset 
command 1s given as an impulse to the boundary cells in the triangular systolic array. 
At the next of the reset command, the rotation parameters in the boundary cells are 
generated the values c= 1, s=OQ and pass them to rightward in order to sluft down all 
stored datas in the triangular section without any computation in the internal cells. 
Let us consider the case where the dimension of @(k) is 4(d=4). After the “reset” 


command, the entries of 4X4 matrix R(k) and 4% 1 vector B(k) are shifted down to 
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the linear array with timie sequence as shown in Figure 3.4. All cells in the linear arrav 
are initialized te zero. The variable Z;(n) are forced leftward through the processor 
with accumulation inner product terms in the rest of the processor as it moves to the 
left in Equation (3.17) while FF and b; are lowered as indicated in Figure 3.7. The left 
end processor is dilferent in that it computes 0k) by function in equation 3.17. At 
any time b: reaches the boundary cell, it is combined with Z(n) and the processor 


computes the weight vector @(k). 
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Figure 3.7 The Linearly connected Systolic Array. 


We generate the element 0,(k), 84.108) and 8, (k) of the least-squares weight vector 
O(k), in that order, at the output of the boundary cell, and transterred the matrix B(k) 
as a part of the initial condition of the next block processing I, 4.) as shown in Figure 
3.8. [Ref. 10: pp. 514-515] 


u(t) 
ne < Data !nput and Initialize 





Figure 3.8 Overall Computation Structure. 


In the case of SISO, the triangular systolic array operates during the interval 
I,. Another time period is required for computation of the parameter 0,(k) in the 
linear systolic array section. It turns out that the exact additional clock times for the 
linear systolic section(represented by 1)) is proportional to the dimension of 
p(n)(l) = 2d) and can be expressed as a O(d) which 1s showed in Figure 3.9. In fact the 
solution of the triangular system requires a number of d substitutions. 
To summarize, the total clock time required for computation of O(k) is N + 2d. The 
Operation tnvolved at the individual clock cycles of the linear array section are 


described tn Figure 3.10 for the case of d= 4. 
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Figure 3.9 Block Processing. 


From the above description we see that advantages and disadvantages are 
associated to this parallel estimation scheme. A clear advantage is the parallelism in 
computation, as opposed to the serial nature of recursive least-squares, which makes it 
attractive when a large number of parameters has to be estimated. A disadvantage is 
that this technique is effective only in conjunction with block processing. This is due 
to the fact that we always need the time to solve for the linear system in the linear 
section. Therefore the parameter estimates are updated only at the end of the block 
time, contrary to the serial case when updated estimates of the parameters are available 


at any time. 


37 


BCUNDARY INTERNAL 
CEL Geers CEES CeErs 


re 


ScD 


a ; (i 
ae Be. ce '—— 














LW AS) : 7 | 
' ~~ A 
A +—— +_— ——+—— + 2, (3)=ry, 6, 
N-3 & zs 5 > a7 3 
. co 4 | : | 
' . - ; | 
Ys Z 5 Mile : 8, =5,-7, Gi 
Oa = ae 3 : 
Ie ——— de > | : 
; SD S: ea) ge ae iS hae | 
c 3) = ee 
PGS ae Me a 3 (8) 3 as 
<~—_— $e ~<—<— +> oe 
= | ‘ 
os OFS Sy. 
ae 2 3(5)= 8 tig 
yO. o> ae ye = 272 - ieee 
> (Ne ew ee a Billo) ie) r 
eS ake ! > | een << 
pe SS LF x es + 27 e 
2147) Y I2 ‘ 
——f  ~_— ta + C7) sce 
N+7 —_— > = ——-> 
J 9 | ; +1, (ome 
yo, 3 
~—_——— ~<j--—- whe am 
- @ | 8 = pea 
/ hinciemeeaet = ite 
PA 
Co, 





Figure 3.10 Operation of The Linear Systolic Array ford = 4. 
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IV. SIMULATION STUDY 


A. MODEL SET 

The linear model in general can be expressed by either the difference equation 
relating the input and output sequences, or in a regression form as in the previous 
Chapters. In the examples below we consider the problem of estimating the 
parameters of a third order model using the systolic technique descrited in the previous 


Chapter. Consider the class of models with difference equation 


y(t) a ayy(t-1) ae a5y(t-2) ae a3¥(t-3) = byu(t-1) = v(t) (4.1) 


where u(t), y(t) are the input and output sequences and v(t) 1s an added disturbance. 


An alternative way to express (4.1) is obtained by defining the vectors 


gl a ( ays a4, a3, by ) 
(4.2) 
l(t) = [ -y(t-1), -y(t-2), -y(t-3), u(t-1) | 
which leads to the regression model 
y(t) = OF oct) + v(t) (4.3) 


This model describes the observed variable y(t) as linear combination of the 
components of the observed vector @(t) plus noise. In the parameter estimation 
problem the values in 9 are assumed to be unknown. 
1. Noise Mode! 
The most realistic model has to include noise in system and measurement. In 
general, v(t) in equation (4.1) is a sequence of independent random variables with zero 
mean values, for an ideal “white noise” source. When the noise is “colored” (i.e., its 


samples are correlated) we need to encrease the order of the difference equation to 


BY 


include the dynamics of the noise model. In this case a higher modei order is required 
which incorporate also noise dynamics [Ref. 3: p. 265]. From a physical point of view 
it is Common to work with disturbances that are “measurement errors” or “output 


error’ expressed in equation (4.4) 


B(q7!) 


x 


y(e= “aq? NG (4.4) 


the difference between this model and equation 4.1 1s also illustrated in Figure 4.1. 


| v(t) 


| v(t) 
u(t) B y(t) 





(a} (D) 


Figure 4.1 Noise Model -(a) Equation Error (b) Output Error. 


Identification methods that use the model in equation (4.4) are often known as output 
error methods or model reference identification methods. In this research, we simulate 
two types of model using parallel identification algorithm. 
2. Choice of Initial Value in The Systolic Array 

For a recursive algorithm, we need to initialize the svstolic array with an 
initial parameter estimate 0(0) and initial covariance matrix(6,1 in equation 2.14). The 
choice of these initial values will be discussed in this section. In equation 2.14, 6, 1s 
related to the confidence we have on the initial condition 0(0). If some prior 
information about 9(0) is available it should of course be used for determining suitable 


values of 0(0). In particular as we have mentioned in Chapter III at the beginning of 
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the new Clock I, 4) We use the estimate obtained at the previous time block. If no 
? 


prior information is available, the most common choice is to take 
Q0) = 0, Plo)=6,71 . | (4.5) 


ae : ae 
where 6,~ is a large number for fast convergence. In this research we use different 


values of (0) for comparison of the results. 


B. PARALLEL ALGORITHM SIMULATION 
The program which ts included in this research report in the Appendix is used to 
investigate the parallel algorithm on the basis of the svstolic array presented in the 


previous chapters. Consider a plant with discrete time transfer function 


Bee 
H(z) = —— oo (4.6) 
(eee te a) Z~ =e a5Z SE a 


The plant has two zeros at z = O and three stable poles in the z-plane. The 


difference equation associated with the transfer function (4.6) can be expressed as 


pay + ay y(t-1) epMita2 aay (to) = yu t-1) (4.7) 


For identification purpose the input is “sufficient rich” in frequency in order to 
excite all modes of the system. This translates in the requirement that the input u(t) 
must contain a sufficient number of sinusoids. In particular u(t) should contain at 
least 4n different sinusoidal components as mentioned in Chapter II. Writing (4.7) in 


regression form we obtain 


1 g! Q + v(t) (4.8) 


where 


o'(t) = [ y(tl), y(t-2), y(t-3), ult-L) ] 
and 


41 


_ 
ae 
9 = | a3 
| by 
@ being the vector of the actual parameters of the plant. 
Example 4.1) 


Consider the discrete time system with transfer function, 


H(z) [oa 
ae 
(2 aoe 


linear difference equation 


v(t) ~ LSy(tel) + 0.735(t-2) = 0.1259(123)) eae (4.9) 


and input sequence 


u(t-1) = cos(nz/10) + cos(n7/5) + cos(nnt/2) + cos(snr/5). 


We applied the parallel identification algorithm using 6, = 1, and 100, no disturbance 
present and N = 10n as the block length for the triangular systolic array section. The 
corresponding plots are given in Figures 4.2 and 4.3. We can see that in ideal 
conditions all estimated parameters converge to their correct values. The estimated 
weight vector plotted versus time for number of block (represented by NB). 
1. Choice of Optimal Block Length 

In the previous chapters, we set the time block length to N’ 2 )Omeeaiaae 
reason of this has been highlighted in Chapter I, and it gives a sufficient condition for 
global stability in adaptive control. In the simple case of parameter estimation we do 
not have to maintain this restriction. An important factor is to achieve fast 
convergence, whenever possible and to find an optimal block length for fast 
convergence. The influence of the block length on the convergence rate will now be 


illustrated by means of simulation in example 4.2. 
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Example 4.2)( choice of optimal block length ) 


For the illustration of optimal length, we have used the same system shown in 
equation (4.9) with different block lengths. It has been simulated with 3 different block 
lengths without noise and with noise condition from equation error ( S/N = 10 ) in . 
order to test the convergence rates in the different cases. So, we have used the same 
amount of clock cycles for the comparison of each results using different block lengths. 
It means that the scale of the X-axis has exactly the same amount of clock cycles. The 
numerical computation for the required number of blocks are summarized in Table | in 


order to expressed the number of time blocks in each simulation. 


TABLE 1 
REQUIRED BLOCK NO. 


Block oo | Entire Block Length | Required 


Per Ome clock (N= 2d) Cycl Block NO. 





The results are shown in Figures 4.4 through 4.6 without noise condition, and 
the corresponding outputs with noise condition are in Figures 4.7 through 4.9. 

It turns out that the estimated weight vector shows the fastest convergence 
rate when used with N = I5n. Even thought the estimated parameter a; converges 
very fast at N = 5n, other estimated parameters exhibit slow convergence rate 
compare to the case of N = I5n, regardless the noise condition. Therefore the optimal 
choice of block length must be the case when N = 15n, and there is one more 
interesting comment in these results. We realized that within certain limits, the longer 
the block length, the faster the convergence rate, but when we extended the length N 
= 20n, the convergence rate of that result is simular to the output of N = 10n, and for 
the more accurate comparison we can look. at the Figure 4.3 (N=10n) and 4.7 


(N=20n). As it were, the conclusion with this example is that the convergence rate to 
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be increased acccrding to increasing of the block tength, but if the bleck length exceed 
certain saddie point then the convergence sate starts to decrease. This optimal block 
length (N= I5n} is applied for the remaining simulation studies. 
2. Application to Systems with Varying Parameters 

A desirable application of adaptive identification is the tracking of time 
varying parameters. Then there is no obviously tradeoff between tracking ability and 
noise sensitivity [Ref. 3: p. 274] and it will be impossible to exactly follow parameters 
with a high rate of change. However, a slow time variation of the actual parameters 
can be often be tracked reasonably well. To illustrate this approaches consider the 


example below. 
Example 4.3) ( tracking of parameters with time-varying ) 


Consider the svstem with model as in equation (4.11) and time varying parameters. 
We simulated the input-output behaviour of the svstem for a number of block 


(represented by NB) = 150 using the parameter values 


des -1.08 if 30 = NB Ss 60,30 = Neaeaze 
a5 = -0.75 otherwise 


The parameter estimates obtained with this example are displayed in Figure 4.10. 
From the result, we can show that all estimated parameters converged very 


Well to their expected values. 


Example 4.4) ( tracking of parameters with noise ) 
We take the two different types of model set for simulation in this example. 
For the case of equation error, the system ts 


y(t) = ayy(t-l) + agy(t-2) + azy(t-3) = byu(t-1) +v(t) (4.10) 
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The other’case for output errors 


(4.11) 


y(t) Pay ta) dy (tems een eee mo byu(t-1) + v(t) + a,Vv(t-el) + a5v(t-2) 7 agai teaE 


where v(t) is white noise of zero mean for both systems. In this simulation, all 
parameters gain are used in equation (4.9) and tested with different input-signal to 
noise ratio, 1.e., S/N = 10 and S/N = 2. The corresponding resuits are in Figures 4.11 
and 4.12 for using equation error and Figures 4.13, 4.14 are from output error model. 
In these results, the outputs of equation error exhibit less variations the results 
from output error. The comparison of the two model sets have not been carried out 
because there is no general result that clearly factors either model set. The amplitude 
of variation alwavs depends on the system itself. 
Final simulation is extended of Example 4.3 with two different types of noise 


model (S/N = 10) for the verification of parallel algorithm and illustrated in Example 
4.5. 


Example 4.5)( tracking with time-varying parameters and noise ) 
We simulated the combination of two system shown in Example 4.3 and 4.4. 


The systems are given by 
Mitye ayy(tel) = a5y(t-2) aay Seo Ce een 
for equation error, 


and 


si eae a;y(t-L) 7 agvil-2) 1 day tea) = Voges ea a, v(t-1) oe ayV{t-2) a a3 V(t-3) 


for the output error case. The parameters above change as follows 


aj;= 18 

a> = -1.08 50 SeNBe= CORO = Nb 
aga) 210 

Dee 
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a= -0.75 otherwise 
az = 0.125 


The results are given in Figures in 4.15 and 4.16. 

All parameters are converged very well under time-varying and noise 
condition. 

in the discussion so far, We simulated several possible different conditions to 
investigate the behaviour of the parallel algorithm. The simulation study shows that 
we can obtain a reasonable parameter tracking in the presence of noisy measurements. 
The time block approach averages out the effects of random noise and provides an 


adjustable flexibility to improve the convergence rate of the parameter estimates. 
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V. PERFORMANCE COMPARISON AND CONCLUSION 

A. COMPUTATIONAL PROBLEM OF SERIAL RECURSIVE LEAST- 

SQUARES 

The estimation of the model parameters using a parallel structure and standard 
least-squares algorithm has been described in Chapters [I and [II]. We have discussed 
two possible ways of implementing the estimation algorithm: serial as in equation 
(2.13) and parallel as described in Chapter III. In this section we want to summarize 
the various trade-offs of the two algorithms, on the basis of their complexity in terms 
of computational power required. In particular consider the number of algebraic 
Operations required for one iteration as a comparison of the two algorithms. For the 
case of serial implementation consider the two recursions in equation (2.13) which both 
update the covariance matrix as well as the parameter estimates. From a numerical 
standpoint the recursion of the covariance matrix P(t) exhibits the highest complexity, 
since it requires a number of operations which encreases with d?. dX d) bememaie 
dimensions of P(t).. Also it 1s worth mentioning that P(t) is not updated by direct 
application of the recursion in (2.13) but it is rather computed by LUD factorization 
[Ref. 3: p. 336] in order to preserve its positive definiteness in the presence of numerical 
errors. In this way it is possible to compute the number of arithmetic operations 
required for matrix inversion and parameter estimation. Table 2 shows the total 
number of arithmetic operations for updating the parameters and also can be used to 


determine the time required per iteration. 


AB ee 
NUMBER OF ARITHMETIC OPERATION FOR UPDATING TiNiieas 


Addition 


Multiplication 


Division 


Subtraction 
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As shown in Table 2, for example, let us consider a 3rd order autoregressive mode! 
with three poles as in example 4.1. In this case the dimension of @(represented vy d) is 
4, and a total 34 additions, 70 multiplications, 8 divisions and 1 subtraction are 
required fer each parameter updating It is clear that the sampling time, T, encreases 
with the square of @(d) in the standard recursive least-squares. As a consequence of 
this we can see that the rate at which we can enter the data @(t) into the algorithm is 


F 9) 
proportional! to 1/d* 


B. COMPUTATION OF PARALLEL STRUCTURE 

In the case of parallel structure, the computations required by the limitation in 
the maximum sampling frequency are performed in each cell at each clock cycle. As 
discussed in the previous chapters the estimation is performed by two distinct 
processors: P! which performs matrix triangularization and operates within each time 
block, and P2 which solves the triangular svstem and it operates at the end of the time 
block. Each of these two processors has different complexities as shown below. The 
number of operations in each cell of the triangularization processor Pl! is shown in 
Table 3. Since the cells operate in parallel, and in a pipeline fashion, at the end of each 
computing cycle we can enter new data, so the throughput(1.e., the rate at which we 
can feed new data into the processor) is constant and independent of the complexity of 
the system. We can say therefore that in this case the sampling time is of order T, = 


Pp 


. ee es 
O(1), constant, compared with T.. = O(d*) in the recursive implementation. 


P 


TABLE 3 
NUMBER OF ARITHMETIC OPERATION IN PARALLEL STRUCTURE 


Section Triangular Systolic 


Addition 1 2 
3 4 
I 
| 


Multiplication 
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Division 


Subtraction 


On the other hand, the sena! tmpiementation, the estimated parameters are 
available at any instant of time. This is not the case for the parallel algoritnm 
presented since it has to wait until the end of the block to solve for the linear system of 
equation. Let us consider the time interval for parameters updating between both 
cases of the serial computation and parallel structure. This time is related to the 
amount of serial computation required by the two algorithm. 

For example, let us consider the particular case shown Table 2 for serial 
computation, and assume the time block to be of length N = 10n (n = number of 
order) in the parallel structure. Although we cannot sav a priori which cell takes 
longer computing time, bv refering to Table 3 we can conjecture that the internal and 
the boundary cells of triangular svstolic array perform more complex operations than 
the linear array section. In Table 4, we compute the number of arithmetic operations 
to performed for each clock cycle in the serial computation and one entire time block 
(N+ 2d) in the parallel structure shown in Figure 3.9. The former one based on either 
the number of operations in internal cells or boundary cells of the triangular svstolic 


array. 


EAB LES 
COMPARISON OF SERIAL AND PARALLEL STRUCTURE (DIM.®M= 4) 


Serial Computation Parallel Computation 


Addition 38 76 
Multiplication 104 [ae 


Division oo 


Subtraction 





As shown in Table 4, it is clear that the number of required arithmetic operations 
in the serial computation per clock cycle is approximately half the number of 
operations needed for the entire time block (N+ 2d) in the parallel structure. Since the 
number of operations is related with the time interval between parameters updating, we 


can have an idea of the rate at which we update the parameters. However we must 
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keep in mind that the serial algorithm updates the parameters on the basis of one 
sample onlv, while the parallel algorithm uses all samples within the time block. the 
benefit of this is a better performance of the parallel algorithm in case of noisy signals. 
This is in line with the fact discussed earlier that the sampling rate for the parallel 
algorithm can be much faster than the seria! one, thus allowing more data to be 


processed in a given amount of time. 


65 


APPENDIX 
COMPUTER PROGRAM OF PARALLEL ALGORITHM 


He fe fe oie ote fe A oe fe 


This program calculated the estimated weight vector of least-squares using 


parallel algorithm, the input is a coefficient of polynomial ; the form 


; Byz"! 
NZ) = 
[ An Zep eee ore 2 


where the all coefficients are given interactively and real. This program is limited with 
system model as shown above transfer function, same type of denominator with Ist, 
2nd, and 3rd order cases and one input parameter. If run this program, should be 
extended virtual storage capacity to 2M in advance due to the large array. This 
program consists of main and four different subroutines which are performed each 


cell’s function. 


VARIABLE DEFINITION 
KAKKKAKARAKAAKAKAAKARAKAKKAAKKAKK 


REAL A(300,5,300), U(309 5,300) ,XIN(5,5),CINP(5,5),SINP(5,5) 

REAL COUP(5_5),SOUP(5,5),SOUN(5,5),CINN(5,5),SI NN(5, 8) ROUT (7a 
REAL $X(5,5),A1,B1,YNP,YP,INUP,SIG,PI,COUN(5,5) ,EA1,EB1L,NSX(5,5 
REAL A2,YPM1,3BB, ZA2 IPM, SAL SBI, PI10,2IN(5,5) ,ZOUT(S5,5) 

REAL WK(5,5),REB(12),AM,V,S,DIN(6,6,15) ,DOUT(7,7,15) 

REAL XLIN(7,7,15),WKIN(5, 6)’ WK1(885,2IB 

INTEGER K,L,M,NT,IK,IL,NBN,X1,L1,N,1T,NI,IX,NM1,NB,X2,L2,KR,LR,NC 


AAKAKRKK TNITIALIZE THE GIVEN TRANSFER FUNCTION AND ALL VARITABIESI ae 


1000 PRINT * "ENTER THE SIZE OF MATRIX N BY N ** N ?! 
READ *. 
PRINT a N= '0N 
PRINT *' 
PRINT *,'ENTER THE ORDERS OF DEN. OF DIFF. EQ.**NO ?'! 
READ *,NO 
PRINT * ,'NO = ',NO 
PRINT *, 
PRINT *, aon MANY BLOCK DO YOU WANT TO ITERATE **NBN ?! 
READ *,NBN 
PRINT *,'NBN = ',NBN 
IF ( NO .EQ. 1) on 
PRINT *, * 'FORMAT OF 1ST ORDER DIFF.EQ.***Y(Z) = B1*U(Z-1) + Al*Y(Z- 
PRINT *,'ENTER THE COEFF. OF Al ?' 
READ * Al 
PRINT *,'Al = ',Al 
PRINT *,'ENTER THE COEFF. OF Bl ?! 
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fee, st = ' 61 
WRITE (6,7C0) 
E@oniF ( NO. .EO. 2) TEEN 


PRINT *,'FORMAT OF 2ND ORDER 
A1)- A2*Y ¥(Z- 2 ' 

PRINT *,'ENTER THE COEFF. OF 
READ *,Al 

PRINT *,'Al = ',Al 

PRINT *.'ENTER THE COEFF. OF 
READ *,A2 

PRINT *,'A2 = ',A2 

PRINT *.'ENTER THE COEFF. OF 


WRITE (6,720) 

ELSEIF ( NO .EQ. 3) THEN 

PRINT *.'FORMAT OF 3RD ORDER 
aie AOKY(Z-2) + A3Z*Y(Z-3)! 

PRINT *,'ENTER THE COEFF. OF 

READ *,Al 

PRINT *,'Al = ',Al 

PRINT | xy , SENTER THE COEFF. OF 

READ 

PRINT ee ae = ' a2 

PRINT *,'ENTER THE COEFF. OF 

READ * 

PRINT *,'A3 = ',a3 

PRINT *, , ENTER THE COEFF. OF 

READ * 

PRINT i ce = 1 

WRITE (6,740) 

ELSE 

PRINT *,'ERROR ** PLEASE TRY 

GOTO 1000 

END IF 

NML=N-1 

NB =15*NO 

mG = 1. 

PI = 3.141592 

PI10 = PI/10 


i 1 
S = 0, aa 


0. 

Or. 
0 
0 
Q 


DIFF .EQ.***Y(Z) = B1*U(Z-1) + A1*Y(Z- 
Ble 


Aca 


Ble 2)! 


DiFR EO.7~1(2) = BIC 2-1 jee Al*Y(Z- 
aie 


OR 
poe 


Bale 


AGAIN AMONG THOSE 1,2,3 FOR NO' 


MUP e—meos(PT10) + COS(PI10*2) + COS(PI10*5)+ COS(P1I10*6) 


AAKKX MAIN PROGRAM BEGIN ****xkxx 


Poul Ik = 1 
DO 20 IL = 1 
IF Te 


~~ 


IL Peer eet 


AGU Ul 


Q. 
COUP(IK,I ae 
SOUP(IK,IK)=0 
a Ail) =" Si 
SX(IK,IL) = 0. 
END IF 


IF (ik iS N) THEN 
ZIN(IK, IL 
WK(IK,IL) = 

END IF 

IF (K NE. 1) THEN 
XIN(IK,IL)=0 


G 
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nie ti) oe) 


END iF 
20 CONTINUE 
10 CONTINUE 


DO 330 IB = 1,NBN 
DO 30 NT-= 1,,NB 


RRAKRAK TIME VARYING INPUT ****xxxxX 


e IF (((IB.GT.30).AND.(IB.LT.60)).OR.((IB.GT.90).AND. (IB.LT.120))) 
C * THEN 
@ ere 
C 2 sane 
C AS = Ona 
C Bi 226 
E ELSE 
E AG iL oS 
é A2e=s0o 
C hoon 
@ Bie is 
C END IF 
IF (NT .LE. NB-N+l) THEN 
CALL GAUSS(IX,S,AM,V) 
IF (NO .EO. 1 ) THEN 
YNP = B1IXINUP - Al1*YP 
C 2NE = BL*INUP + Al*(YP+V) 
=e 
A(NT,JF,NT) = YP 
ae = INUP 
A(NT,JF+2,NT) = YNP 
ELSEIF ( NO .EQ. THEN | 
C YNP = BL*INUP + Al*(YP+V)- A2*YPM 
C YNP = B1*(INUP+V) - A1*YP 
YNP = SL*INUE + AL*YP-AC*YPM 
JF = 
A(NT,JF,NT) = YPM 
A(NT,JF+1,NT) = YP 
A(NT, JF+2,NT) = INUP 
A(NT,JF+3,NT) = YNP 
ELSEIF ( NO .EQ. 3) THEN 
C YNP = BL*INUP + Al1*YP- A2*YPM+A3*YPM1+V+A1*VM1+A2*VM2+A3*VM3 
YNP = B1*INUP +Al1*YP- AZ*YPM + A3*YPM1 +V 
c ¥NP = B1*INUP + AL*YP-A2*YPM+ A3*YPM1 
A(NT,JF,NT) = YPM1 
A(NT,JF+1,NT) = YPM 
A(NT,JF+2,NT) = YP 
A(NT,JF+3,NT) = INUP 
A(NT,JF+4,NT) = YNP 


END ITF 
AKKAAKA TRANSFORMATION OF INPUT DATA WITH SKEW ORDER ***x*x* 


DO 40 J = 1,N 

IT = J-1 

NI = NT+IT 

ONT, J NI)o= ACNTed a) 
40 CONTINUE 

END IF 


kkRKKK CALCULATE THE ORTHOGONAL TRIANGULAR MATRIX BY GIVENS 
rotation AND INITIALIZE THE ORTHOGONAL MATRIX ****x*% 


AkRKKKK CALCULATE THE BOUNDARY AND INTERNAL CELLS ******* 
DO 50 K = 1,N-l | 


DO 60 L Lat 
HT CIN seh I shale 
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-AND. (K+L .LE. NT+1))THEN 
THEN 
ih, & NT) 


AAARKAK CALCULATE THE BOUNDARY CELL ****%*% 


ihe) K GEO. Le THEN 
CALL BND Oe L,NT,XIN,SX,COUP,SOUP) 
SNS Dee = COUP (K. re 


Saritk, bti) = SOUP(K,L 
eect .E NB) THEN 
Se Lil} = Q. 
Suk, btl) = <1. 

BD ee 


RARKARK CALCULATE THE INTERNAL CELL **** 


Boro Keele tHe 
CALL INRN (K,L,NT,XIN 


XOUT,SX,NSX,CINP, SINP) 
GINN (141) = CINP oe 


BiNN (he L+l) = SINP(K,L 
Sack,b) = NSX(K,L) 

END IF 

Biome (ak .Gl. Lb) THEN 
SX(K,L)=0. 

BIDE 


60 CONTINUE 
Bom CONTINUE 


Bom o” Kl = 1, . i 

DO 80 Li=1l, 

iteecwk) .NE. ii) THEN 
Bor ,»ul) = a a 
SEC KT LL) = SINN(K1,L1 


END IF 

XIN(K1+1,L1) = XOUT(K1,L1) 
80 CONTINUE 
70 CONTINUE 

IF ( NO .EQ. 1) THEN 

YP = YNP 


ELSEIF ( NO .EQ. 2) THEN 
meth = YE 
eee LN 


ELSEIF No .=Q. 3) THEN 
YPM1 = Y 

YPM = yP 

YP = YNP 

VM3 = VM2 

VM2 = VM1 

VM1 = V 

END IF 

ier 


INUP = COS(PILO*NX)+COS (PILO*NK*2)+COS (PILO*NK*5 )+COS (PI10*6*NX) 
ARAAKK CALCULATE THE ESTIMATED COEFFICIENT WITH RESET ****xxx 


= 1,N 

EOeelO JM = 1,N 
ROUT(IM,JM) = a0 
XIN(IM, JM) = 

610 CONTINUE 

600 CONTINUE 
DO 400 NR = 1,2*N-1 
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DO 410 ZR = 1,N 
DO 420 LR = 1,N 
TF((XR .NE. LR) .AND. (KR .LT. N)) THEN 


’ CX CT > 
CALL INRN (XR LR NR XIN XOUT SX NSX CINP SINE) 
Ile (LR NE. N ) THEN 

DIN?KR,LR,NR) = XOUT(KR,LR) 

END IF 

SINNT KR ERE = Seat 

SINN(KR,LRtL) = SINP(KR,LR 

SX(KR,LR) = NSX(KR,LR) 

Peoe aC ik sO ao oa NDE (KR aLes N) ) THEN 
XIN(KR,LR) = 0. 


CaLL BND(KR,LR,NER SIN Sx, COUP, SOUP 
RARARARAARARARAKAAKARARRARARARRRRR 
Se Ee = Soro oak 

= SOUP 


SINN(KR,LR+1 KR,LR 
END IF 

IF(KR EQ. N) THEN 

IF (LR .EQ. N) THEN 


CALL LINBD (N,KR,LR,NR,ZIN,XIN,WK,WK1,SX,REB,NC 


KREARKARKRKKAKKKX KAKA AKKAAREKRARREREKKRKRARRARKRARRER 


IF ((NR-£Q. 3).OR.(NR.=Q.5).OR.(NR -EQ. 7).OR.(NR .EQ.9)) THEN 
END IF : 

ELSEIF ((LR .GT. 1) .AND. (LR .LT. N)) THEN 

CALL LINNT(KR,LR,NR,ZIN,XLIN,WK,WKIN, ZOUT 


(erie ite ath bree on * 


RRERRKAKKAAKAKAK RANK RAK RRA AARKAAAKARAKARER 
END IF 
END IF 

420 CONTINUE 

410 CONTINUE 
DG 430° K2 = 1N 
DO 440 L2 = 1,N 





IF (( K2 ..NE. &2 ) -AND. (K2 LT. NOMeHET! 
CINP ae = gee 

SINP(K2,L2) = SINN(K2,L2 

END IF 


IF (L2 .LT. N-1 ) THEN 
XIN(K2+1,L2+1) = DOUT(K2,L2,NR) 
DOUT(K2,L2,NR+1) = DIN(K2,L2,NR) 
ELSEIF (L2 .EQ. N-1) THEN 
XLIN(N,K2+1,NR+2) = DIN(K2,L2,NR) 
ELSEIF (L2 .20. N) THEN 
XIN(K2+1,L2) = XOUT(Z2,L2) 

END IF 

IF (K2 .EQ. N) THEN 

ZIN(K2,L2+1) = ZOUT(K2,L2) 

ie (ce een THEN 


WKIN(K2,L = WKIECKZ ea 
BESS 

WRON(K2, b2=1) >= WKi ka; ig) 
clID ee 

END IF 


440 CONTINUE 
430 CONTINUE 
400 CONTINUE 


AAAKAKK REINITIALIZE FOR NEW BLOCK PROCESSING “*~%~ 


DO 470 N3 = 1,3*N-3 
DO 450 K3 = 1.N 

DO 460 L3 = LN 

IF i .NE. 1)THEN 
XIN(K3,L3) = 0. 

END IF 
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KEeen(K3,43,N3) = 0. 
DIN(K3 13 3N3) = 0. 
NOUR (KS Le ee = 0. 
eMe< Soe bie lt) PEN 
SINE (RS. jbee 


ale 
SINP(K3 35 = 0. 
IF ( X23 5 sEQ L3 ) THEN 
Sx (K3 13 7 
SX(K3,L3) 
END IF 
ELSEIF ( K3 3EQ. N ) THEN 
WK(K3,L3) = 
ZIN(K3,L3) = 
END IF 
460 CONTINUE 
450 CONTINUE 
470 CONTINUE 
ab = 0B 
IF ( NO .EQ. 1) THEN 
WRITE (6, 710) IB ,REB (5), REB(3) 
oe 3 RE Seve 
Sx(2,3 REB(3 *21G 
ELSEIF ( NO EQ. 2) THEN 
WRITE (6, 730) 1B ,REB(5) ,REB(7) , REB(3) 
REB(7)*SIG 


SX(1,4 
Sx(2,4 REB(5)*SIG 
SX(3,4 REB(3)*SIG_ 
ELSEIF ( NO .EQ. 3) T 


WRITE (6, 750) ana eEB (5), PREG? REB (9) >REB(3) 
aS 


i 
© 


les) — 

Sxezaish) = REB 7)*SIG 
SX(3,5) = REB(5)*SIG 
SX(4,5) = REB(3)*SIG 
END IF 

END IF 


30 CONTINUE 
S50, CONTINUE 


700 FORMAT ee 3x BLOCK NO. ',6x,'COEFF. Al',9X,'COEFF. B1') 

710 FORMAT (3X,14,11X,2(F10. 7, Sap 

720 PO '3X. BLOCK NO.!,6X,'COEFF. Al',9X,'COEFF. A2',9X,'COEFF 

730 FORMAT a 14, wip ae 7 roe 

7140 FORMAT ('1',3X, Ce ,OX, ‘COEFF. Al',9X,'COEFF. A2',9X,'COEF 
AF. A3!,9X, ‘aoe ie 

750 eee (3X, F4.0, 12x 4(F10.7,7%)) 


AAKKKKAKAK SUBROUTINE GROUP FOR CELL'S FUNCTION ***xxxxxxX 
KAKKAKKKKAKKKKARKAKKKAKKKKRAKKRKAAK KK 


SUBROUTINE BND (K NT, AIN,S&X,COUP,SOUP 
KKKKKKKKRKKKKKK BNE eae eet ee er 


REAL BENS ev oor 5) FCOUR( 5,5) ,SOUF( 5,5) 


REAL 
INTEGER K,L,NT 

IF ( XIN(K,L) .EQ. 0) THEN 
rie. =], 

SOUP (K.L)=0. 

SX(K,L) = SX(K,L) 

ELSE 

NSO = SK(K,L)*A2+KIN(K, L)**2 
Beak Se 
SOUP(K,L)=XIN(K,L)/NSQ 

SX(K, L)=1 

END IF 
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RETOPRN 
ENC 


SVBROUTINE INRN (K,UL,NT, XIN XOUT SA (NSA, CINr sowie 
KRRKKKKRKERR KEKE ARERR RRR KERR RRA KERAKKARRRRARR KK 


REAL XIN(5,5),SX(5,5),CINP(5,5),SINP(5,5),XOUT(5,5) ,NSX(5,5) 
INTEGER K,L,NT 
XOUT(K,L)=-SINP(K,L)*SX(K,L)+CINP(K,L)*XIN(K,L) 
NSX(K,L)=CINP(K,L)*SX(K, L)+SINP(K,L)*XIN(K,L5 
RETURN 

E 


SUBROUTINE LINNT(KR,LR,NR,ZIN,XLIN WK WKIN ZOUT ) 
KRERAKRKKKKREKR KA KAK AKER KERR REAR RRR RE KAKKERERE 

REAL ZIN(5,5),XLIN(7,7,15),WK(5,5),Z0UT(5,5),WKIN(S5,5) 
INTEGER KR,LR,NR 

ZOUT(KR,LR) = ZIN(KR,LR) + XLIN(KR,LR,NR)*WKIN(KR,LR) 
WK(KR,LR) = WKIN(KR,LR) 

RETURN 


END 


SUBROUTINE LINBD(N,KR LR NR ,ZIN, XIN, WK WRI Sx REB NC) 
RRR RAK RRA RAK RR EEK KERR RAKE RRKRERERERREKRRERE 
REAL ZIN(S,5),XIN(5,5),WKC5,5) ,WKL(5, 5 )esm5 5 ee ore oe 
INTEGER KR,LR,NR,N,NC 

WK(KR,LR) = SX(KR,LR) 

WK1(KR,LR) = XIN(KR,LR) - ZIN(KR,LR) 

SX(KR,LR) = WK1(KR,LR) 

REB(NR) = WK( KR,LR) 

RETURN 

END 


a 


in 
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